Custom template for CAST
logo



Start with a short introduction about what is going on in this document

Introduction


Introduction to PDCA, CAST and the methods contained within this document go here.

Background


Explain some background info that is needed. Change header colours to purple?

Explain Bayesian theory briefly here.. We want a distribution of the forecasts, but don’t have the forecasts to make this from.. Bayesian theory plays part here. Can estimate the Posterior of a distribution by random sampling from a prior distribution..

Sample Data

Step through explanation of 2 stream sample data included. Plot both oil and gas production. Pick one well to apply across. Should this be before the arps parameters? Also show the uwi.matrix table where we summarize important parameters from each well

Summary Table

Key parameters for each production dataset have been summarized and compiled into a table. Cumulative months on production, initial oil and gas rates, peak oil and gas rates, peak month, and cumulative volumes can be seen for each UWI. These parameters will be used in subsequent calculations and help us prioritize wells as we begin to look at learning from wells with more production history later in PDCA process.


UWI OnProd Date Initial Oil Rate Peak Oil Rate Peak Month Cum Oil
100/05-17-081-16W6/00 2018-04-01 26.123677 328.6216 3 62958.48
100/11-17-081-16W6/00 2018-05-01 219.027416 544.5717 2 95700.09
100/12-17-081-16W6/00 2018-05-01 265.308244 538.2610 2 82928.63
102/05-17-081-16W6/00 2018-04-01 6.646233 406.3637 3 39977.40
102/12-17-081-16W6/00 2018-05-01 135.210625 439.8464 2 60875.30
103/05-17-081-16W6/00 2018-04-01 7.631636 378.4579 3 58147.41


Production Plots

Arps Decline models used to forecast historically and which models will be analyzed here

Arps is a common methodolgy for forecasting well production.

Modified Hyperbolic

Explanation of that equation and a definition using code..?

We’ll define the modified hyperbolic equation in executable form. The equation will accept decline parameters and a time series across which it can evaluate production rate. First we’ll need a function to convert secant decline percentage into nominal form.

PDCA

Explain the background of what it is and how it can be helpful in forecasting, especially to handle uncertainty. Define PDCA here! Explain briefly bayesian methods of achieving this? Priors and posteriors

Monte Carlo Method

Explain the place of the monte carlo method as our first try at estimating the posterior ditribution, showing parameter distributions as defining our prior and explaining how they are each sampled. Have inputs here for monte carlo, build any functions required, and run. Show nice clean histograms overlaid of the sampling for b, di, and qi based on the input criteria..

# Define distribution constraints for our Monte Carlo sampling of arps parameters
# Only oil values will be defined here.

## bi values
o.bi_min <- 0.5 #Initial oil b value min value
o.bi_max <- 2   #Initial oil b value max value

## di values; as decimals in secant. Will be converted to nominal in arps equation.
o.di_min <- 0.4   #Initial oil decline percentage min
o.di_max <- 0.98   #Initial oil decline percentage max

## qi multiplier or absolute values; Multipliers will be applied to peak intial rates.
o.qi_min <- 10    #Absolute min oil rate (bbls/d) for qi
o.qi_max <- 3     #Peak oil rate multiplier to define max oil qi (3 * peak_rate)

## dlim values; as decimals in secant.
o.dlim <- 0.07    #Limiting decline percentage used in modified hyperbolic equation

# Abandonment values; as rates in production units
o.abd <- 3
g.abd <- 10

# To handle multiple distribution types, let's define a "DefineDistribution" class
# The class will contain distribution information in a list, defined in S4
# Type = Normal, Lognormal, or Uniform; min_mean = min bound for uniform or mean of lognormal 
# or normal; max_sd = max bound for uniform or standard deviation for lognormal or normal

setClass("DefineDistribution",slots = list(type="character",min_mean="numeric",max_sd="numeric"))

# Using this class, we will define uniform distributions for all of our inputs
b <- new("DefineDistribution",type = "Uniform",min_mean = o.bi_min,max_sd = o.bi_max)
di <- new("DefineDistribution",type = "Uniform",min_mean = o.di_min,max_sd = o.di_max)
dlim <- new("DefineDistribution",type = "Uniform",min_mean = o.dlim,max_sd = o.dlim)

#Let's pick the first well 100/05-17-081-16W6/00 for a peak rate to show the distribution of qi
o.qpeak <- matrix[1,"Oil.Peak.Rate"]

#Then define it's distribution
qi <- new("DefineDistribution",type = "Uniform",min_mean = o.qi_min, max_sd = o.qpeak*o.qi_max)


Let’s visualize our defined distributions and overlay our samples from them. We’ll use a function named sample.any that is defined in the background to simply accept a DefineDistribution class object and randomly sample based on it’s type and inputs.

Uniform Distributions of b, di and qiUniform Distributions of b, di and qiUniform Distributions of b, di and qi

Uniform Distributions of b, di and qi

MCMC Method

Explain here and have links to jump around document? Or explain when we get to it?

CAST


Production Model Synthesis

Explain process of forecast sythesis here, and show how it is carried out for the modified hyperbolic. Show arps substitution, and explain

Combine Sampled Parameters

Let’s combine and organize our sampled parameters in a data frame. We’ll use an equation-specific function to ensure completeness and repeatability for multi-well matching.

qi b di dlim
566.07793 1.8853023 0.4733630 0.07
91.85886 0.7974686 0.9570567 0.07
408.19916 1.0687344 0.8342801 0.07
632.53241 0.5597691 0.8313838 0.07
779.20478 1.0869983 0.5657342 0.07
414.54693 1.0495356 0.9655378 0.07
373.23016 1.8459167 0.8551814 0.07
186.64516 1.5416893 0.9038671 0.07
216.14082 1.0059336 0.4922665 0.07
264.73093 1.2307099 0.5992085 0.07

First 10 rows of modified hyperbolic parameters dataframe

Historical Rate Synthesis

Using the combined parameter dataframe and the modified hyperbolic equation, we can define a function to use each row of the parameter dataframe as a Monte Carlo iteration and substitute it into the arps equation across the time interval of the current well. The function will create and return a mcIteration object for each iteration. This object is a list with slots for Iteration name, Parameters, and Synth. Some additional slots will be created and populated later as we approach cost and forecasting.

# A function, MH.synthesize, will be defined to carry out rate synthesis across the actuals 
# time interval.
MH.synthesize<-function(parameters,q.abd,t){
  #Parse up parameters and pass into MH function call
  fc <-  MH(parameters[["qi"]],parameters[["b"]],parameters[["di"]],parameters[["dlim"]],q.abd,t)
  
  #Simplify parameters into a single string representation
  parametersName<-paste(names(parameters),round(parameters,digits=2),sep = "=",collapse=" ")
  
  #Create mcIteration imitation list to improve performance and ease of combining data
  #Return mcIteration. Remaining list items to be appended by later functions
  return(list("Iteration" = parametersName,"Parameters" = parameters,"Synth"=fc))
}

# Define a time interval of actuals to synthesize across, using 100/05-17-081-16W6/00
uwi<-"100/05-17-081-16W6/00"

# Subset production including only 100/05-17-081-16W6/00
singleWell.all <- formated_production$CD[which(formated_production$CD$Unique.Well.ID == 
                                              uwi),]

# Find peak oil month for this uwi
uwi.pm <- matrix[["Oil.Peak.Month"]][which(matrix$eachUWI==uwi)]

# Refine production datatable to only include data after peak oil rate
singleWell<-singleWell.all[-c(1:(uwi.pm-1)),]

# Create time vector starting at 0 for peak oil month
synth.time <- (singleWell$CD.Month - uwi.pm) * 30.4 # Converted to daily

# Let's apply the MH.synthesize function in an apply call to execute across each row
# of the parameters dataframe
oil.synth<-apply(MH.sampled.parameters,1,MH.synthesize,q.abd = o.abd, t = synth.time)

# Print first object of oil.synth to show mcIteration object
print(oil.synth[[1]])
## $Iteration
## [1] "qi=566.08 b=1.89 di=0.47 dlim=0.07"
## 
## $Parameters
##         qi          b         di       dlim 
## 566.077933   1.885302   0.473363   0.070000 
## 
## $Synth
##  [1] 566.0779 538.0106 513.6163 492.1628 473.1066 456.0348 440.6274 426.6323
##  [9] 413.8474 402.1089 391.2825 381.2565 371.9376 363.2471 355.1178 347.4921
## [17] 340.3206 333.5602 327.1732

Each mcIteration list, defining a Monte Carlo iteration and historical synthesis, can now be evaluated against the actual data to search for an appropriate model.

Loss Functions

How do we compare the synthesis forecasts against our actuals and determine what is a good fit and what is a poor fit?

To compare each synthesis to the historical production data observed, we need to use functions that compare predicted points to actuals for each model. These functions are commonly referred to as loss functions. The most common loss function is the L2 loss function (insert link here?), which is popular due to its easy implementation, especially in linear models. However, the L2 loss is not as useful when comparing data with significant outliers, such as oil and gas production data, because it assigns too much weight to these outliers and rejects models that do not match them. We will need to look at more robust loss functions to find models that match the historical data as an engineer would.

Soft L1

Soft L1 loss is a robust, continuous variation of the common Huber Loss that gives less weight to outliers than a L2 loss, making it useful for fitting time series production data. Our Soft L1 definition will return the individual loss of each point and an average cost for all residuals input.


To apply the loss equation we’ll define a function to apply the Soft L1 loss to the residuals of each model match. The function will include an option to apply the loss function against the log residuals, an option that we’ll take advantage of due to the non-linear nature of production data. The function will populate the IndvCost and Cost slots of each mcIteration object, and once the function is defined, we can use these slots to visualize each model’s fit.

## $Iteration
## [1] "qi=566.08 b=1.89 di=0.47 dlim=0.07"
## 
## $Parameters
##         qi          b         di       dlim 
## 566.077933   1.885302   0.473363   0.070000 
## 
## $Synth
##  [1] 566.0779 538.0106 513.6163 492.1628 473.1066 456.0348 440.6274 426.6323
##  [9] 413.8474 402.1089 391.2825 381.2565 371.9376 363.2471 355.1178 347.4921
## [17] 340.3206 333.5602 327.1732
## 
## $IndvCost
##  [1] 28.88406 32.56231 34.73612 35.90350 35.83259 35.68141 34.42292 34.88264
##  [9] 34.77318 34.58031 34.41093 34.43722 33.83672 33.68369 33.30451 32.98297
## [17] 32.87979 32.56150 32.27778
## 
## $Cost
## [1] 33.82285

With cost calculated using the Soft L1 loss function applied to the standard and log residuals for every model iteration, let’s use the average cost to find our “best fit” model out of the 10,000 iterations. We’ll plot the model against the actual oil rate, coloured by individual cost for both residual method to look at how each handles outliers.

# Use lapply and do.call to combine the average cost of every model object 
# into a single dataframe
oil.cost <- do.call(rbind,lapply(oil.synth,function(x){
  return(x$Cost)
}))

# Repeat for log residuals
oil.cost.log <- do.call(rbind,lapply(oil.synth.log,function(x){
  return(x$Cost)
}))

# Find index of best fit model using log soft_L1 residuals
oil.bf.log <- which.min(oil.cost.log)

# Plot this "best fit" against actuals and colour by standard and log residual cost
# To plot, we'll utilize some custom plotly functions defined in plotResources external file
oil.synth.plot <- plotly.multimodel.indvcostcolor(actuals.x = singleWell[["CD.Month"]],
                                                  actuals.y = singleWell[["Oil.Rate"]],
                                                  iterationObject = oil.synth[oil.bf.log],
                                                  scales = "semi-log")
oil.synth.plot <- layout(oil.synth.plot, xaxis = list(title = "Cal-Day Month"),
                         yaxis = list(title = "Oil Rate (bbls/d)"), 
                         title = "Soft L1 Cost of Residuals")

oil.synth.plot.log <- plotly.multimodel.indvcostcolor(actuals.x = singleWell[["CD.Month"]],
                                                  actuals.y = singleWell[["Oil.Rate"]],
                                                  iterationObject = oil.synth.log[oil.bf.log],
                                                  scales = "semi-log")

oil.synth.plot.log <- layout(oil.synth.plot.log, xaxis = list(title = "Cal-Day Month"),
                         yaxis = list(title = "Oil Rate (bbls/d)"), 
                         title = "Soft L1 Cost of Log Residuals")
# Combine into a subplot
combined<-subplot(oil_prod,gas_prod,nrows = 2,shareY = TRUE,titleX = FALSE,titleY = TRUE)
combined<-layout(combined,showlegend=FALSE,title="Oil & Gas Production",xaxis2=list(title="Cal-Day Month"))

combined.cost <- subplot(oil.synth.plot,oil.synth.plot.log,nrows = 1,titleX = TRUE,titleY=FALSE)
combined.cost <- layout(combined.cost,title = "Soft L1 Cost of Absolute vs. Log Residuals",
                        yaxis=list(title="Oil Rate (bbls/d)"))
#IMPROVE TOOLTIP???
hide_colorbar(combined.cost)


Comparing the absolute residual cost in the left plot against the log residual cost in the right plot, there is similar relative spread in the residuals and both cost the peak rate similarily, but the log residuals does a better job at costing the sub peak at month 9. We will carry log residual cost through the remainder of the CAST methology as it capture the non-linearity of the production data better. One important thing to notice before moving on is that the Soft L1 function gives a high cost to the peak oil rate and does not match it perfectly as a result. This may be concerning when compared to traditional DCA matches, but given that early time production data can be far from steady-state flowing behaviour, it is a useful feature to maintain.

Using the log residuals and the Soft L1 cost function, let’s find the top 100 models from our initial 10,000 iterations and see how they describe the ranges of our actual oil rates. We’ll colour them by average cost to compare the forecasts.

Top 1% of Models Using Soft L1 Loss

Top 1% of Models Using Soft L1 Loss


We can see that the top 100 forecasts do a good job of covering the range of actual oil rates, with the lowest cost forecasts tightly grouped around the actuals. We’ve shown we can use Monte Carlo sampling paired with a loss function to match historical production! Before moving on let’s note that the majority of the forecasts are under-predicting the intial rate. This is primarily driven by the high cost of the peak oil rate using the Soft L1, but could also be improved through our input distributions. This is something we can improve upon later.

PDCA Loss

Explain this loss function, where it’s from, and show a comparison of application to the Soft L1 loss

The Soft L1 loss function does a decent job at finding a best fit model, but is there a better loss function that can be used? Let’s look at a more complete loss function defined in SPE-174784-PA (ADD LINK HERE AS REFERENCE) by Fulford et al. takes a similar approach, but .. EXPLAIN HERE

Just as we did with the Soft L1, let’s use the PDCA loss function to find the top 100 forecasts from the same 10,000 iterations. We’ll see the minimum mean and minimum standard deviation of epsilon to 0, assuming a perfect model. Again we’ll colour by average cost and compare to the Soft L1 method.

# Let's modify the applyLoss function slightly to accept the pdca bestfit parameters
# eps_min input will contain c(eps_mean_min,eps_std_min) for pdcaLoss
applyLoss<-function(iterationObject = c("mcIteration List"),
                    actuals = c("Matrix with actual data"), loss.function = soft_L1,
                    log.residuals = TRUE,
                    eps_min = c(0,0)){
  
  # Calculate residuals from model synthesis to actuals. If log.residuals is TRUE, 
  # return the log of the residuals
  if (log.residuals == TRUE){
    r <- log(actuals) - log(iterationObject$Synth)
  } else {
    r <- actuals - iterationObject$Synth
  }
  
  # Apply loss function to residuals. Returns a list of IndvCost and Cost.
  # Since pdca_loss is used, pass bestfit. Default to 0,0 as ideal bestfit.
  rho <- loss.function(r,eps_min[1],eps_min[2])
  
  # Append IndvCost and avg_cost (named cost) to the mIteration imitation list calling from
  # rho list slots
  iterationObject$IndvCost <- rho[[1]]
  iterationObject$Cost <- rho[[2]]
  
  # return iterationObject
  return(iterationObject)
}

# Apply PDCA loss function
oil.synth.PDCA <- lapply(oil.synth, applyLoss, actuals = singleWell[["Oil.Rate"]], 
                    loss.function = pdca_loss, log.residuals = TRUE)

# Use lapply and do.call to combine each object loss into a single dataframe
oil.cost.PDCA <- do.call(rbind,lapply(oil.synth.PDCA,function(x){
  return(x$Cost)
}))

# Find and plot top 100 (top 1%) models using the ggplot resource. Colour by PDCA cost.
oil.synth.pdcaP1 <- which(oil.cost.PDCA <= quantile(oil.cost.PDCA,0.01,na.rm = TRUE))

#Sort from highest cost to lowest for plotting visuals
oil.synth.pdcaP1 <- oil.synth.pdcaP1[order(oil.cost.PDCA[oil.synth.pdcaP1],decreasing = TRUE)]


oil.plot.pdcaP1 <- ggplot.multimodel.costcolor(actuals.x = singleWell[["CD.Month"]],
                                           actuals.y = singleWell[["Oil.Rate"]],
                                           iterationObject = oil.synth.PDCA[oil.synth.pdcaP1],
                                           scales = "semi-log")
# Add titles and output
oil.plot.pdcaP1 <- oil.plot.pdcaP1 + ggtitle("PDCA Loss Function") +
  xlab("Cal-Day Month") +
  ylab("Oil Rate (bbls/d)")

oil.plot.P1 + ggtitle("Soft L1 Loss Function")
oil.plot.pdcaP1
Comparing Top 1% of Models Using Soft L1 and PDCA LossComparing Top 1% of Models Using Soft L1 and PDCA Loss

Comparing Top 1% of Models Using Soft L1 and PDCA Loss


Before we move on to forecasting, let’s look at the parameter distributions for the top 1% of models to get an idea of the solution space using the pdca loss method.

Parameters of P1 model matches using PDCA Loss methodParameters of P1 model matches using PDCA Loss methodParameters of P1 model matches using PDCA Loss method

Parameters of P1 model matches using PDCA Loss method

Summarize observations from these distributions.. They are mostly lognormal. Therefore, do we need to step into MCMC to get a proper PDCA distribution going forward? Keep these in mind and compare later on..

Summarize findings in these two charts and how we’re going to carry PDCA loss going forward in the rest of the evaluation / process as it provides a better probabilistic representation. Show distribution fits of the accepted parameters? Or do this at a later time? Or here just FYI??

Cost Evaluation

This has already been done. Still need?

Forecast

Look at best fit for the PDCA loss function as we’ve selected it as our go forward loss function. Show how it does and then create a forecast from it for visuals sake. Next add forecasts of the top wells. This may be where we need to start considering a few wells.. Show how it does for good production data and how bad it does for good production data? Or is this another document all together?

Let’s forecast out the best fit model from our PDCA loss function to take a look at its long term projection.


We can see the forecast distribution appears to summarize our actual data well. Let’s compile the forecasts into P10, Mean, P50, and P90 curves to represent all the forecasts. We will need to define a function to accept a list of objects and return a dataframe with the consolidated forecasts, then pass it into a custom ggplot function to plot against our actuals and accepted forecasts.

## consolidate.rates will be used to accept a list of mcIteration objects with forecasts,
#  combine the forecasts into a dataframe to perform probabilistic calcs, and output a single
#  dataframe with x, P10, Mean, P50, and P90 consolidations. Only input accepted objects.

consolidate.rates <- function(iterationObjects = c("List of mcIteration objects with forecasts
                                                   to consolidate"), 
                              forecast.x = c("dataframe containing x values used in forecasts"),
                              q.abd = o.abd,trim = c("Y / N")){
  
  # For each iterationObject, use an lapply and do.call to combine into a single dataframe
  
  consolidation.df <- do.call(rbind, lapply(iterationObjects , function(itObject){
    return(itObject$Forecast)
  }))
  
  # Perform statistical calcs across each column for each timestep
  mean <- colMeans(consolidation.df,na.rm = TRUE)
  P90 <- apply(consolidation.df,2,quantile,.1,na.rm=TRUE)
  P50 <- colMedians(consolidation.df,na.rm=TRUE)
  P10 <- apply(consolidation.df,2,quantile,.9,na.rm=TRUE)
  
  # Cut off forecasts at abandonment rate
  mean[mean<q.abd]<-0
  P90[P90<q.abd]<-0
  P50[P50<q.abd]<-0
  P10[P10<q.abd]<-0
  
  # Combine into single dataframe with forecast.x
  consolidation <- data.frame("x" = forecast.x, "P10" = P10, "Mean" = mean,
                              "P50" = P50, "P90" = P90)
  
  # Trim length to remove rows with all zeros
  if(trim == "Y"){
    trim <- rowSums(consolidation[,2:5], na.rm = TRUE)
    consolidation <- consolidation[-which(trim == 0),]
  }
  
  # Return trimmed result
  return(consolidation)
  
}

# Let's consolidate the accepted forecasts using this function
oil.consolidated <- consolidate.rates(oil.forecast,forecast.x = t.forecast + 3, q.abd = o.abd,
                                      trim = "Y")


# Plot results
oil.plot.consolidated <- ggplot.forecast.consolidated(actuals.x = singleWell[["CD.Month"]],
                                               actuals.y = singleWell[["Oil.Rate"]],
                                               iterationObject = oil.forecast,
                                               forecast.x = t.forecast + 3,
                                               scales = "semi-log",
                                               consolidation = oil.consolidated)

# Add titles and output as iteractive
oil.plot.consolidated <- oil.plot.consolidated + 
  xlab("Cal-Day Month") +
  ylab("Oil Rate (bbls/d)") +
  xlim(0,300)
ggplotly(oil.plot.consolidated)

Let’s compare these mean, P10, P50, and P90 curves made by aggregated individual rates from each time interval, to curves made by aggregating representative forecasts for mean, P10, P50, and P90 on 30 year cumulative volumes. The decline parameters inside +/- 0.5 percentile neighbourhoods around each target were averaged to summarize modified hyperbolic curves for each probabilistic outcome.

Re-create something like this after the MCMC algorithm. Prove why it needs to be used? This algorithm above is used for the type well methodology..

As the forecast values change from including historical to not, be sure that the whole curve is synthesized and then cut-off for forecast.x.. as q, di, b are related to those parameters..


Both charts show similar results. We can see the P10, P50, and P90 forecasts in dashed lines, overlaid with the mean in solid black. While it looked like our forecasts represented the actuals and an appropriate forecast distribution, our P50 and Mean appear to be slightly over-estimating on the late-time actuals. Let’s investigate methods at improving this distribution, namely, Markov Chain Monte Carlo methods.

Markov Chain Monte Carlo (MCMC)

Explain concept or link above if put there. Define metropolis-hasting algorightm.

Burn-In

Use original MCMC run as burn-in.

Jumping function and distributions

To proceed through the Metropolis-Hasting algorithm for MCMC, we need to define jumping distributions and use them to propose the next test parameter set.

# Define mcmc.eval function to evaluate sampled jump for acceptance probability. 
# Returns x_i+1 if accepted, else, xi
# eps_min input used to pass eps_mean_min and eps_std_min of bestfit for pdca_loss function

mcmc.eval <- function(currentIteration = c("mcIteration object defining x_i"),
                          sampledParms = c("dataframe of next iteration parameters"),
                          loss_function = c("loss function to evaluate acceptance"),
                          synth_function = c("synthesis function"),
                          actuals = c("data frame with actuals"),
                          actual.t = c("time to synthesize"),
                          q.abd = c("abandonment rate"),
                          eps_min = c(0,0)){
  
  # Synthesize data using sampledParms and synth_function
  proposed <- synth_function(sampledParms,q.abd,actual.t)
  
  # Call applyLoss function to evaluation sampled parameters
  proposed <- applyLoss(proposed,actuals,loss_function,log.residuals = TRUE,eps_min)
  
  # Evaluate acceptance probability
  A <- exp(-1/2*proposed$Cost) / exp(-1/2*currentIteration$Cost)
  
  # Evaluate acceptance
  if(runif(1)<A){
    output <- proposed
  } else{
    output <- currentIteration
  }
  
  # Return proposed mcIteration object and acceptance probability
  return(output)
}

# Define MH.mcmc to perform MCMC Metropolis Hastings algorithm for modified hyperbolic model
# Returns list "chain" of accepted mcIteration objects.


MH.mcmc <- function(iterations = 10000,
                    initial = c("starting mcIteration object"),
                    loss_function = c("loss function to evaluate acceptance"), 
                    jump = c("list of jumping distributions"), 
                    bounds = c("list of bounding distributions"),
                    actuals = c("actual rates"), 
                    actual.t = c("actual time"), q.abd = c("abandonment rate")){
  # Initialize chain
  chain <- vector("list",iterations)
  
  # Assign initial mcIteration object to chain
  chain[[1]] <- initial
  
  # Calculate eps_mean_min and eps_std_min from initial as bestfit for pdca loss
  r <- log(actuals) - log(initial$Synth)
  eps_min <- c(mean(r,na.rm = TRUE),sd(r,na.rm = TRUE))
  
  # Define bounds for each parameter
  bound <- lapply(bounds,function(x){
    if(x@type == "Uniform"){
      c(x@min_mean,x@max_sd)
    } else if(x@type == "Normal"){
      c(qnorm(0.01,x@min_mean,x@max_sd),qnorm(0.99,x@min_mean,x@max_sd))
    } else {
      c(qlnorm(0.01,x@min_mean,x@max_sd),qlnorm(0.99,x@min_mean,x@max_sd))
    }
  })
  
  # Perform MCMC Metropolis Hasting algorithm across number of iterations
  for (i in c(2:iterations)){
    # Assign starting points of jumping distributions to previous iteration
    # Use ln values for qi and di
    jump[[1]]@min_mean <- log(chain[[i-1]]$Parameters[[1]])
    jump[[2]]@min_mean <- chain[[i-1]]$Parameters[[2]]
    jump[[3]]@min_mean <- log(chain[[i-1]]$Parameters[[3]])
    jump[[4]]@min_mean <- chain[[i-1]]$Parameters[[4]]
    
    # Sample model parameters using jumping distributions
    sampledParms <- apply(MHSample(1,jump[[1]],jump[[2]],jump[[3]],jump[[4]]),2,
                          function(x){x})
    
    # Revert log parameters
    sampledParms[c(1,3)]<-exp(sampledParms[c(1,3)])
    
    # Compare sampled parameters to bounding distributions
    if(sampledParms[["qi"]] <= bound[[1]][2] & sampledParms[["qi"]] >= bound[[1]][1] &
       sampledParms[["b"]] <= bound[[2]][2] & sampledParms[["b"]] >= bound[[2]][1] &
       sampledParms[["di"]] <= bound[[3]][2] & sampledParms[["di"]] >= bound[[3]][1] &
       sampledParms[["dlim"]] <= bound[[4]][2] & sampledParms[["dlim"]] >= bound[[4]][1]){
      # Accept and call mcmc.evel to evaluate jumping parameters and append to chain
      # mcmc.eval will return x_i+1 if accepted or x_i if not
      chain[[i]] <- mcmc.eval(chain[[i-1]],sampledParms,loss_function,MH.synthesize,
                            actuals,actual.t,q.abd,eps_min)
    } else{
         # Reject parameters and keep previous iteration
      chain[[i]] <- chain[[i-1]]
       }
  }
  
  # return chain with MCMC samples
  return(chain)
}

# Using best fit from initial monte carlo calls, perform metropolis hasting algoritm
# Define best fit

# Combine costs from oil.forecast
# Use lapply and do.call to combine each object loss into a single dataframe
bestfit.cost <- do.call(rbind,lapply(oil.forecast,function(x){
  return(x$Cost)
}))

# Find min cost as best fit
bestfit <- oil.forecast[[which.min(bestfit.cost)]]

# Define jumping distributions as DefineDistribution objects
qi_jump <- new("DefineDistribution",type = "Normal",min_mean = 0, max_sd = 0.2)
b_jump <- new("DefineDistribution",type = "Normal",min_mean = 0, max_sd = 0.2)
di_jump <- new("DefineDistribution",type = "Normal",min_mean = 0, max_sd = 0.4)
dlim_jump <- new("DefineDistribution",type = "Uniform",min_mean = o.dlim, max_sd = o.dlim)

# Combine into single list
jump <- list(qi_jump,b_jump,di_jump,dlim_jump)

# Combine bounding distributions into bound
bounds <- list(qi,b,di,dlim)

# Call MH.mcmc to perform MCMC evaluation
oil.mcmc <- MH.mcmc(iterations = 10000, initial = bestfit,
                    loss_function = pdca_loss, jump = jump, bounds = bounds, 
                    actuals = singleWell[["Oil.Rate"]], actual.t = synth.time, q.abd = o.abd)

# Need any sort of "kick away from these bounds? Or let it iterate until it gets away itself?
# Look at changing the jumping distributions as well

# Add Di !< 1 error handler in MH function as well?

oil.mcmc.plot <- ggplot.multimodel.costcolor(actuals.x = singleWell[["CD.Month"]],
                                           actuals.y = singleWell[["Oil.Rate"]],
                                           iterationObject = oil.mcmc,
                                           scales = "semi-log")
# Add titles and output
oil.mcmc.plot <- oil.mcmc.plot + 
  xlab("Cal-Day Month") +
  ylab("Oil Rate (bbls/d)") +
  ggtitle("PDCA Loss MCMC Output")

# Plot Monte Carlo PDCA plot
oil.plot.pdcaP1

# Add mcmc plot to output
oil.mcmc.plot
Comparing Top 10% Monte Carlo Matches using PDCA Loss to MCMC PDCA Loss OutputComparing Top 10% Monte Carlo Matches using PDCA Loss to MCMC PDCA Loss Output

Comparing Top 10% Monte Carlo Matches using PDCA Loss to MCMC PDCA Loss Output

We can see that although the MCMC output may require some tuning, it returns 10,000 matches that describe the actuals data very well in a probabilistic manner. They have a tighter spread around the actuals with more forecasts for probabilistics. Let’s look at the distributions of the matched parameters to look for improvements over the top 100 matches from the initial Monte Carlo run.

Accepted match parameters using MCMC method and PDCA LossAccepted match parameters using MCMC method and PDCA LossAccepted match parameters using MCMC method and PDCA Loss

Accepted match parameters using MCMC method and PDCA Loss

The resulting distributions are dependent on the jumping distributions, which need to be tuned appropriately, but the accepted distributions follow both normal and lognormal distributions. How do the probabilistic forecast outcomes look for this MCMC run?

Results

Probabilistic forecasts

 




Darcy Redpath
Petronas Canada